An Introduction to Autotuning Parallel Applications
نویسنده
چکیده
منابع مشابه
Application-independent Autotuning for GPUs
Autotuning is an established technique for adjusting performance-critical parameters of applications to their specific run-time environment. In this paper, we investigate the potential of online autotuning for general purpose computation on GPUs. Our application-independent autotuner AtuneRT optimizes GPU-specific parameters such as block size and loop-unrolling degree. We also discuss the pecu...
متن کاملTowards Autotuning of OpenMP Applications on Multicore Architectures
In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic benchmarks on IBM Power 775 architecture. The tool provides an automatic code instrumentation of OpenMP parallel regions. Based on measurement of chosen hardware ...
متن کاملCompiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and ma...
متن کاملpOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures
We have developed pOSKI: the Parallel Optimized Sparse Kernel Interface – an autotuning framework to optimize Sparse Matrix Vector Multiply (SpMV) performance on emerging shared memory multicore architectures. Our autotuning methodology extends previous work done in the scientific computing community targeting serial architectures. In addition to previously explored parallel optimizations, we f...
متن کاملPiecewise holistic autotuning of parallel programs with CERE
Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve best performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring long iterative evaluations. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces calle...
متن کامل